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Bayesian  Ability  Estimation  via  3PL  with  Partially  Known  Item  Parameters 

ROBERT  K.  TSUTAKAWA  AND  JANE  JOHNSON 

Abstract 

The  conventional  method  of  measuring  ability,  which  is  based  on  items  with  true 
parameters  assumed  to  have  values  estimated  by  a  pretest,  is  compared  to  a  Bayesian 
method  which  deals  with  the  uncertainties  of  such  items.  Computational  expressions  are 
presented  for  approximating  the  posterior  mean  and  variance  of  ability  under  the 
three-parameter  logistic  (3PL)  model.  A  1987  ACT  math  test  is  used  to  demonstrate  that 
the  standard  practice  of  using  maximum  likelihood  or  empirical  Bayes  techniques  may 
seriously  underestimate  the  uncertainty  in  the  estimated  ability  when  the  pretest  sample  is 
only  moderately  large. 

Key  Words:  ability  estimation;  Bayesian  IRT;  calibration;  pretest;  three-parameter 
logistic. 


INTRODUCTION 


standard  practice  in  mental  testing  is  to  score  individuals  using  the  responses  to  a 
set  of  test  items  which  have  previously  been  calibrated.  When  latent  trait  models  are 
employed,  the  calibration  involves  estimating  parameters  of  the  model  using  a  moderately 
large  sample.  The  estimated  parameters  are  then  assumed  to  be  the  true  values  when  the 
scoring  is  performed. 

(  Even  when  the  assumed  model  is  correct,  there  are  two  sources  of  errors  in  this 
process.  One  is  due  to  the  responses  of  the  individuals  being  scored  and  the  other  is  due  to 
the  error  in  the  calibration.  Ignoring  the  second  source  could  lead  to  inferential  errors, 
particularly  when  the  calibrating  sample  is  not  large.  In  many  areas  of  testing  large 
samples  may  not  be  readily  available  for  calibration.  Moreover,  disclosure  laws  commonly 
require  public  dissemination  of  tests,  making  it  necessary  to  have  more  items  while  the  pool 
from  which  to  draw  the  calibrating  sample  is  limited.  \ 

-  This  paper  deals  with  the  problem  of  estimating  ability  when  there  is  uncertainty 
concerning  the  item  parameters  due  to  the  limited  size  of  the  calibrating  sample.  Because 
of  the  sequential  nature  of  first  calibrating  the  test  and  then  using  it  on  the  target 
population,  the  Bayesian  paradigm  for  statistical  inference  is  particularly  attractive.  This 
paper  discusses  how  the  uncertainty  in  the  item  parameters  may  be  incorporated  into  the 

/  N 

estimation  and  uncertainty  of  the  abilities  being  measured.  ^  (l )  4 — — - 

The  main  idea  will  be  demonstrated  in  terms  of  the  three-parameter  logistic  model 
(3PL)  which  was  introduced  by  Birnbaum  (1968).  The  model  specified  that  the  probability 
of  a  correct  response  by  an  individual  with  real  valued  ability  0  to  a  given  item  has  the 
form 


“  c  +  l+exp{-a(b-0)}  ’ 


(1) 


where  £  =  (a,b,c)  is  an  unknown  item  parameter,  subject  to  a>0,  -oo<b<oo,  and  0<c<l.  It 
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will  be  assumed  that  a  test  consisting  of  K  items  has  already  been  given  to  a  calibrating 
sample  of  n  individuals  and  that  the  posterior  mean  or  mode  of  the  item  parameters  is 
already  available.  (See  Mislevy  k  Bock,  1984,  or  Tsutakawa,  1988,  for  algorithms  to 
compute  the  posterior  mode.) 

The  uncertainty  in  the  calibrated  item  parameters  will  be  summarized  in  terms  of 
the  posterior  covariance  matrix,  which  is  approximated  by  the  inverse  Hessian  of  the 
negative  log  posterior  evaluated  at  the  mode.  The  mode  and  covariance  matrix  will  then 
be  used  in  the  approximation  of  the  posterior  mean  and  variance  of  ability  presented  by 
Tsutakawa  k  Soltys  (1988)  for  the  case  of  the  two— parameter  logistic  model  (2PL),  a 
limiting  case  of  3PL  when  c=0  in  (1). 

The  method  will  be  illustrated  on  data  from  a  1987  American  College  Testing 
Program  (ACT)  math  test.  The  results  will  be  compared  with  the  more  conventional 
approaches  using  maximum  likelihood  as  implemented  by  LOGIST  (Wingersky,  Barton,  k 
Lord,  1982)  and  empirical  Bayes  based  on  item  parameters  estimated  by  marginal 
maximum  likelihood  (Bock  k  Aitken,  1981,  and  Tsutakawa,  1988).  The  main  conclusion  of 
the  paper  is  that  when  there  is  uncertainty  in  the  item  parameter,  both  maximum 
likelihood  and  empirical  Bayes  underestimate  the  variance  of  ability  and  therefore  produce 
interval  estimates  which  aie  too  narrow  and  misleading. 


General  Setup  and  Alternative  Solutions 

Consider  a  K  item  test  where  the  items  are  scored  Xj=0  or  1  according  as  the  answer 
to  the  jth  item  is  incorrect  or  correct,  j=l,...,K.  Assume  local  independence  so  that  the 
probability  of  the  response  vector  x  =  (xj,...,x^)  for  an  individual  with  ability  0  is 


K  X: 


p*(x|0)  =  n  pf  J(0){i-p,  (#)}  J, 

'  j=i  si 
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where  (  =  (£p— ,£^)  is  the  set  of  item  parameters  for  the  K  items. 

For  calibration,  assume  there  are  n  individuals  with  abilities  9  —  (0p...,#n)  sampled 
from  a  N(0,1)  distribution.  If  y^  =  is  the  response  vector  of  the  ith  individual 

and  y  =  (yp...,yn)  the  response  matrix  of  the  n  individuals,  the  joint  distribution  of  (y ,0) 
is  given  by 

p(yM  =  .5  O) 

where  p^(y.  | is  defined  by  (2)  and  <j>  is  the  N(0,1)  pdf.  The  marginal  distribution  of  y 
then  has  probability  function 

p(yl°  “  J  J  ^  Wyi (4) 

If  p(  0  is  a  prior  pdf  of  £,  the  posterior  of  f  is  simply 

p(€|y) «  P(0P(y|0-  (5) 

The  main  problem  to  be  addressed  in  this  paper  is  the  estimation  of  the  ability  9  of 
a  new  individual  with  response  vector  x,  when  we  are  given  y,  the  data  from  the 
calibration. 

When  £  is  known,  a  standard  method  of  estimating  9  is  by  maximum  likelihood 
(ML),  i.e.,  finding  the  value  of  9  which  maximizes  the  likelihood  function  /(0|£)  =  p^(x|^). 
In  this  case  the  variance  of  the  ML  estimator  9  may  be  approximated  by  the  inverse  of  the 
test  information  function,  which  is  given  for  3PL  (Lord,  1980,  p.  73)  by 


where  z.  =  a-(0-b.)  and  (a-,  b-,  c.)  =  In  the  absence  of  known  £,  a  common  practice  is 

J  J  J  J  J  J  J 

to  replace  £  by  £j,  a  component  of  the  joint  ML  estimate  (£j,  Oj)  based  on  the  likelihood 
function 


A(.t)  =  n  p^y^) 


(7) 


and  to  estimate  the  unknown  new  9  by  the  value  of  9  which  maximizes  the  conditional 
likelihood  function  /(0|£T)  =  p-  (x|0). 

When  £  is  known,  the  standard  Bayesian  method  (Birnbaum,  1969)  of  estimating  9 
is  in  terms  of  posterior  mean 


where,  by  Bayes  theorem, 


0=E(0|x,£)  = 


0p(0|x,f)d0, 


(3) 


P(0|x,£) 


P^(x  |  9)<t>{9) 
P^(x|  9)<p{9)d9 


(9) 


In  this  case  the  measure  of  uncertainty  is  the  posterior  variance, 

?  =  V(9\x,0  =  J(<M)2p(0|x,£)d0.  (10) 


(See  Lord  (1986)  for  an  interesting  comparison  of  the  posterior  mean  and  ML  estimate  of  0 
when  £  is  known.) 
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In  the  absence  of  a  known  £,  an  empirical  Bayes  (EB)  solution  would  be  to  replace  f 
in  (8)  and  (10)  by  the  marginal  maximum  likelihood  estimate,  based  on  (4).  This  type 
of  approach  has  been  criticized  by  Deeley  k  Lindley  (1981)  for  its  failure  to  account  for  the 
error  in 

When  f  is  unknown,  the  Bayesian  solution  is  through  the  posterior  distribution  of  0 
given  the  data  z  =  (x,y).  The  pdf  of  this  distribution  is  given  by 


p{0\z)  =  p(0|z,£)p(f|z)d£, 


where,  from  the  conditional  independence  if  x  and  y  given  £, 


(11) 


p(£|z)  =  pWOp(£|y)/p(x|y) 


(12) 


and 


P(0M  =  p*(x|0)<£(0)/p(x|O- 


Substituting  (12)  and  (13)  into  (11)  we  have 


P£(x|0)p(C|y)df. 


(13) 


(14) 


where  we  can  now  see  how  p(£|y)  serves  as  the  prior  for  f  subsequent  to  the  calibration. 
The  posterior  mean  and  variance  of  9  can  be  similarly  expressed  as 


//  =  jE(0|x,Op(£|z)d£ 
a2  =  J  E{(0-/r)2|x,Op(£|z)d£. 


(15) 

(16) 
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These  expressions  are  difficult  to  work  with  numerically.  Practical  approximations  are 


presented  in  the  next  section. 


It  is  instructive  to  consider  a  decomposition  of  the  posterior  variance  (16)  in  order 


to  identify  the  sources  of  variability.  Using  a  well  known  identity  (e.g.  DeGroot,  1986,  p. 


225)  we  have 


'V(0|?,z)p(^)de  +  |{E(«|M-E(0|z))2p«|z)d4 


V(9jf,x)p(az)df  +  |{E(»|e,x)-E(9|z)}2p(e|z)de,  (17) 


where  we  have  used  the  fact  that  (0,x)  and  y  are  conditionally  independent  given  f.  Thus 
the  posterior  variance  of  9  may  be  interpreted  as  the  average  conditional  posterior  variance 
of  9  given  £  plus  the  variance  of  the  conditional  posterior  expectation  of  9  given  f  with 
respect  to  the  posterior  uncertainty  in  £.  The  empirical  Bayes  variance  approximates  the 
first  integral  in  (17)  with  (10)  by  replacing  £  with  but  ignores  the  second.  The  second 
term  is  important  when  £  is  ill-determined  after  observing  z. 


Bayesian  Approximation 

The  approximation  used  to  compute  the  posterior  mean  and  variance  of  9  under  the 


2PL  model  by  Tsutakawa  &  Soltys  can  be  modified  for  3PL.  The  approximation  is  a 
special  case  of  Lindley's  (1980)  approximation  to  the  posterior  mean  of  a  function  of 


hyperparameters. 


Suppose  w(£)  is  a  function  of  the  item  parameter  f  whose  expectation  we  wish  to 


evaluate.  Let  w  and  w  denote  the  value  of  w(£)  and  its  second  partial  derivatives 

10 


evaluated  at  the  posterior  mode  £  Let  r  denote  the  elements  of  the  approximate 

1  0 


posterior  covariance  matrix  of  £  where  the  approximation  used  is  the  inverse  Hessian  of 
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the  negative  log  posterior  evaluated  at  The  approximation  is  then  given  bv 

w  =  w  +  JSwrsrrs.  (18) 

When  the  posterior  of  £  is  normal,  this  is  Lindley's  approximation.  For  a  heuristic 
justification  of  w  and  discussion  on  other  Bayesian  approximations  see  Tsutakawa  & 

Soltys. 

To  approximate  the  posterior  mean  of  9 ,  use  w(£)  =  E(0|x,£)  in  (18).  If  the 
approximate  mean  has  value  m,  the  posterior  variance  of  9  is  similarly  computed  by  using 
(18)  again  with  w(£)  =  E{(#-m)^|x,£}. 

To  apply  this  approximation  in  practice,  we  recommend  replacing  p(£|z)  by  p(f|y) 
in  ( 15)  and  (16).  This  substitution  will  have  a  negligible  effect  since  z  contains  data  on 
only  one  additional  individual.  Moreover  updating  p(f|z)  for  each  new  individual  is  not 
only  costly  but  would  change  the  scoring  criterion  from  one  person  to  the  next  so  that  two 
individuals  with  the  same  response  x  could  have  different  estimates  of  9.  This  substitution 
is  obviously  not  necessary  when  the  9  being  estimated  is  for  an  individual  in  the  calibration 
sample.  Computational  expression  for  w,  w  and  r  needed  for  3PL  are  summarized  in 

Lo  Id 

the  Appendices. 

Numerical  Results 

The  ability  estimation  will  now  be  demonstrated  using  a  sample  of  n=400  for 
calibration  and  an  additional  sample  of  100  to  estimate  100  9.  Both  samples  were  drawn 
from  a  larger  sample  of  1987  ACT  math  test  results  where  K=40.  Due  to  the  fair  number 
of  omitted  responses,  the  samples  were  selected  after  deleting  examinees  who  omitted  the 
last  item  or  more  than  10%  of  the  items. 

The  computation  of  the  posterior  mode  is  based  on  the  EM  algorithm  (Dempster, 
Laird  &  Rubin,  1976)  as  implemented  in  Tsutakawa  (1988)  for  the  case  in  which  the  prior 


of  £  is  derived  from  a  Dirichlet  distribution  based  on  data  from  a  similar  test  given  in  1981. 
This  prior  assigns  a  joint  distribution  to  the  probability  of  correct  responses  at  three  values 
of  9.  When  properly  constrained,  it  induces  a  distribution  on  f.  Although  we  use  the  same 
prior  as  Tsutakawa  (1988),  the  data  used  here  has  been  resampled  from  the  same  1987  test 
after  making  the  deletions  mentioned  above.  We  also  use  the  parameterization 

=  (b-,c.,d-),  where  d.  =  log  a.,  j=l,...,K,  in  order  to  enhance  the  asymptotic  normality 
J  J  J  J  «J  J 

of  the  posterior  distribution  of  £,  a  condition  which  would  make  our  approximation  closer 
to  Lindley's. 

Table  1  lists  the  posterior  modes  of  the  item  parameters,  together  with  the 
approximate  standard  deviations  an'’,  within  item  correlations.  The  correlations  between 
items  are  used  in  the  computation  but  not  listed  since  they  are  quite  numerous.  These 
results  were  used  to  compute  the  means  and  standard  deviations  of  9 ,  for  the  100 
individuals,  which  are  plotted  in  Figure  1.  The  standard  deviations  are  lowest  when  the 
estimated  9  are  close  to  0.25  and  increase  quite  rapidly  as  the  estimated  9  departs  from  this 
central  location.  This  general  pattern  is  to  be  expected  since  tests  of  this  type  are 
generally  designed  to  assess  individuals  whose  abilities  are  close  to  the  average. 

In  order  to  compare  these  results  to  those  under  ML  and  EB,  the  corresponding 
estimates  of  9  and  their  standard  deviations  were  computed  using  the  methods  outlined 
above.  Figure  1,  which  gives  a  plot  of  the  values  computed,  shows  that  the  standard 
deviations  under  EB  tend  to  be  considerably  smaller  than  those  under  Bayes.  For  9  >  .5, 
the  standard  deviations  under  ML  also  tend  to  be  considerably  smaller  than  those  under 
Bayes,  but  for  9  <  .5,  the  two  procedures  have  comparable  standard  deviations  on  the 
average. 

In  Figures  2  and  3  the  ML  and  EB  estimates  of  9  are  plotted  against  the  posterior 
means.  There  is  a  general  agreement  over  the  interval  from  about  —1.2  to  0.  However  the 
posterior  means  in  the  interval  from  0  to  2  tend  to  be  larger  than  the  other  two  estimates. 
For  the  more  extreme  values  (which  are  listed  but  not  plotted  in  Figure  2),  there  is  a 


•tt! 


tendency  for  the  Bayes  estimate  to  be  pulled  more  towards  the  origin  relative  to  ML.  In 
fact  there  was  one  individual  having  a  perfect  score  whose  0  cannot  be  computed  under 


The  inferential  effect  that  the  different  procedures  have  on  the  estimates  and  their 
standard  deviations  may  be  illustrated  in  terms  of  interval  estimates,  defined  here  as  the 
estimate  of  9  ±  2  standard  deviations.  The  end  points  of  the  interval  estimates  obtained  by 
ML  and  EB  are  plotted  against  the  end  points  of  the  Bayesian  or  posterior  intervals  in 
Figures  4  and  5  for  the  first  50  of  the  100  examinees.  It  is  quite  apparent  from  these  plots 
that  both  ML  and  EB  produce  substantially  shorter  intervals  than  the  Bayes  intervals  in 
most  cases.  Since  the  intervals  tend  to  become  quite  wide  when  the  estimated  9  departs 
from  the  origin,  the  intervals  were  converted  to  percentile  intervals,  with  a  percentile 
defined  by  100  (f)(0)  where  ({)  is  the  N(0,1)  cdf.  The  corresponding  plots,  shown  in  Figures  6 
and  7,  further  accent  the  narrower  width  of  the  ML  and  EB  intervals. 

In  order  to  explain  the  differences  observed  in  these  graphs  it  is  important  to 
consider  the  differences  in  the  assumptions  and  information  used.  The  Bayes  and  EB 
methods  both  assume  a  N(0,1)  prior  on  9.  Thus  the  observed  difference  is  not  due  to  the 
prior  on  0  but  the  use  or  nonuse  of  certain  information  from  the  calibration  phase.  Under 
EB,  the  unknown  £  is  replaced  by  its  estimate,  without  accounting  for  the  error  in  the 
estimate,  resulting  in  a  deflated  standard  deviation.  On  the  other  hand  ML  makes  no 
distributional  assumption  about  0,  suggesting  that  its  intervals  should  be  wider  than  Bayes 
(Lord,  1986).  However  the  fact  that  the  estimated  £  is  treated  as  the  true  value  again 
deflates  the  standard  deviation,  but  not  to  the  extent  of  EB. 


Discussion 

The  main  conclusion  of  this  paper  is  that  tests  based  on  calibrations  which  produce 
imprecise  estimates  of  item  parameters  and  ignore  this  imprecision  can  lead  to  serious 
inferential  errors.  The  discrepancy  found  here  between  the  Bayes  and  conventional 


methods  for  3PL  is  more  striking  than  that  reported  earlier  by  Tsutakawa  &  Soltys  (19SS) 
for  2PL. 

In  large  scale  testing  the  sample  size  for  calibration  is  typically  substantially  larger 
than  the  400  used  here.  However,  increasing  the  size  of  the  calibrating  sample  alone  will 
not  increase  the  precision  of  the  ability  estimates.  The  major  component  of  the 
uncertainty,  whether  the  inference  is  Bayesian  or  frequentist,  is  the  randomness  of  the 
individual  response  pattern  x  for  a  given  9  and  f.  This  uncertainty  cannot  be  reduced 
without  increasing  the  number  of  items  in  the  test. 

There  is  a  need  for  better  approximations  which  are  not  only  more  accurate  but 
simple  enough  for  routine  use.  Bayesian  approximations,  which  are  adaptable  to  modern 
computer  technology  are  only  beginning  to  appear  and  give  promise  for  widespread 
Bayesian  applications  in  testing. 

For  inferential  purposes  it  is  important  to  distinguish  the  sampling  variance  of  the 
ability  estimator  for  individuals  with  ability  9  and  the  posterior  variance  of  9  for  an 
individual  with  response  x.  The  former  may  be  interpreted  as  the  variance  before 
observing  x  among  these  with  ability  9  and  the  latter  as  the  (subjective)  posterior  variance 
after  observing  x.  Since  9  is  unknown,  the  former  is  unknown,  but  it  is  common  practice  to 
estimate  it  by  replacing  9  with  its  maximum  likelihood  estimate  based  on  x.  Since  this 
estimate  is  unreliable  so  is  this  variance  estimate.  On  the  other  hand  the  posterior 
variance  is  a  measure  of  uncertainty  we  have  about  a  particular  individual's  9  after 
observing  x.  The  subjectivity  of  this  measure  enters  only  through  the  choice  of  the  prior 
distribution  for  the  item  parameters.  This  subjectivity  should  not  be  a  controversial  issue 
when  the  choice  of  the  prior  is  based  on  past  tests,  as  in  the  illustration  used  here.  If  one  is 
interested  in  the  probable  values  of  9 ,  after  x  has  been  realized,  the  Bayesian  approach  is 
the  logical  choice. 
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Appendix  A:  Expressions  for  Computing  the  Empirical  Information  Matrix 
To  simplify  the  notation  let  ^  ^  £j<>)  =  (bj,  Cj,  dj) 

and  Pjj  =  P£^(#){1— P^  (#)}  y^- 

Now  define 


S3I 


gs(i,j,^)  =  aogPjj/^jg, 

gst(i,j,«)  -  AogPy/^j,, 

hst(i,i,«)  =  («2logPij/^js^jt(/Pij, 


for  i  =  l,...,n;  j  =  s,  t  =  1,2,3.  We  then  have 

hSt(i’j’^)  =  «8t^’j’^  + 

Define,  for  notational  convenience  only, 
z0.  =  exp(dj)(tf-bj), 

(pe.  =  {1 +exp(~^)}-1, 

^9]  =  1_<V 

X0)  =  {!  +  Cjexp(— z^)}“ L, 

7/0j  =  {Cj  +  exp(z^j)}-1. 

Then  the  first  two  derivatives  of  log  P-  may  be  expressed  by 


g3(>J  ,0) 
gn0,j^) 


-  exp(dj)(yijA^j-0^j), 

y,j[^j  -  i/Ccj-1)]  +  i/(Cj-i), 

WcV’ 

exp(2dj)(cjyijA<lj,Sj-^(lj^j), 

yyotptdj)^, 
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g33(ivh0) 


-otpfdjXyjA^)  +  z  ^exp(  d  j )  [  0^  ^j-Cjy  j  j  A  ^  7?^] , 

-  I^ij^j +  (1-yij)/(cj-i)2], 

^V^rV^1  +  zoj(yijA«r^j)- 


Then  h  t(i,j,0)  may  be  expressed  in  terms  of  the  expressions  for  g  and  g  through 

O  ot 

equation  (19). 

Now  define  the  conditional  posterior  expectations  (given  f)  of  these  functions  by 


Sg(bj) 

EstW) 

3st(i,j,j') 


=  f°°  gs(ij»0)p(%i,f)d0» 

J-00 

=  f 

J  -oo 

=  f°  gs(iJ^)gt(i,j'^)p(^|yi,Od^ 

J  — rr> 


for  i  =  1 . .  jj'  =  and  s,t  =  1,2,3,  where  p(%.,£)  =  p^M^/p^lf),  the 

posterior  pdf  of  ^  given  y[  =  (yir-,yiK) 

Then,  finally,  the  first  and  second  partial  derivatives  of  the  loglikelihood  function 
L(0  =  log  p(y|0  are  given  by 

aw/ae*  =  s  gso,j), 

JS  i=1  s 
and 


^U)ld(js9(yt  = 


n 


E.{Kst(i’j)“gs(i’j)gt.(i’j)} if  H' 


i=l 

n 


. J' )-gs(j  J)gt(!J'}  if  tfj', 

for  i=l,...,n;  j,j'=l,...,K;  s,t  =  1,2,3.  The  empirical  information  matrix  is  the  negative  of 
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the  3Kx3K  2nd  derivative  matrix,  i.e., 


HQ  = 


To  complete  the  expressions  for  the  Hessian  of  the  negative  log  posterior,  we  must 


add  to  1(0  the  second  partials  of  the  negative  log  prior,  expressions  for  which  are 
summarized  in  Tsutakawa  (1988).  A  quadrature  method  for  the  required  numerical 
integration  is  given  in  Tsutakawa  (1984). 


Appendix  B:  Expressions  for  Computing  the  Second  Derivatives  of  w(£) 


I 

The  function  w(f)  used  in  the  approximations  have  the  form, 


w(0  =  Jf^Jp^lypOdfl., 

2 

where  f(^)  =  ^  for  the  posterior  mean  and  =  (8--m)  for  the  posterior  variance.  By 
interchanging  the  order  of  differentiation  and  integration,  the  required  second  derivatives 
of  w(£)  evaluated  at  f  f  have  the  form 

wst(j,j')  =  |«*1){»!i>(A1|yi,t)/afj8aej,t}d^1 

for  j>  j'  =  1,—>K  and  s,  t  =  1,2,3.  (Additional  subscripts  have  been  introduced  here  in 
order  to  identify  the  nesting  of  the  item  parameters  within  items.) 

Upon  evaluating  the  second  derivatives  of  p(^|yj,  £)>  the  computational  expressions 
reduce  to 


wst(j,j)  =  E{f(^)hst(i,j,^)}  - gt(i,j)E{f(^)gs(i,j,(9i)}  —  gs(i,j)E{f(0i)gt(i,j,0j)} 

+  E{f(ft)}{2gg(i,j)gt(i,j)-Egt(i,j)} 

and,  for  j#j', 

wgt(j,j')  =  E{f(^)gg(i,j,^)gt(i,j',^)}  -gt(i,j')E{f(^)gs(i,j,^)}  -gs(i,j)E{f(^)gt(i,j',^)} 
+  E{f(^)}{2gs(i,j)gt(i,j')  -5g{(i,j,j')} 


for  j,  j'  =  1,...,K  and  s,  t  =  1,2,3,  where  E{}  denotes  expectation  with  respect  to  the 
posterior  pdf  p(0j|yj,f)- 


References 


i 

: 

;  Birnbaum,  A.  (1968).  Some  latent  trait  models  and  their  use  in  inferring  an  examinee's 

ability.  In  F.M.  Lord  &  M.R.  Novick  (Eds.),  Statistical  Theories  of  Mental  Test 
Scores.  Reading,  MA:  Addison  Wesley. 

Birnbaum,  A.  (1969).  Statistical  theory  for  logistic  mental  test  models  with  prior 
distribution  of  ability.  Journal  of  Mathematical  Psychology,  6,  258—276. 

Bock,  R.D.,  &  Aitken,  M.  (1981).  Marginal  maximum  likelihood  estimation  of  item 
parameters:  An  application  of  an  EM  algorithm.  Psychometrika,  46,  443-459. 
Deeley,  J.J.  and  Lindley  D.V.  (1981).  Bayes  empirical  Bayes.  Journal  of  the  American 
Statistical  Association.  76,  833—841. 

DeGroot,  M.H.  (1986).  Probability  and  Statistics,  2nd  ed.  Reading  MA:  Addison— Wesley. 
Dempster,  A.P.,  Laird,  N.M.  &  Rubin,  D.B.  (1977).  Maximum  likelihood  from  incomplete 
data  via  the  EM  algorithm  (with  discussion).  Journal  of  the  Royal  Statistical 
Society ,  Series  B,  39,  1—38. 

Lindley,  D.V.  (1980).  Approximate  Bayesian  methods  Trabajos  Estadistica  31,  223-237. 
Lord,  F.M.  (1980).  Applications  of  Item  Response  Theory  to  Practical  Testing  Problems. 
Hillsdale,  NJ:Erlbaum. 

Lord,  F.M.  (1986).  Maximum  likelihood  and  Bayesian  parameter  estimation  in  item 
response  theory.  Journal  of  Educational  Measurement,  23,  157—162. 

Mislevy,  R.J.,  &  Bock,  R.D.  (1984).  BILOG :  Item  analysis  and  test  scoring  with  binary 
logistic  models.  Mooresville,  IN:  Scientific  Software. 

Tsutakawa,  R.K.  (1984).  Estimation  of  two-parameter  logistic  item  response  curves. 
Journal  of  Educational  Statistics  9,  263—276. 

Tsutakawa,  R.K.  (1988).  Dirichlet  prior  in  Bayesian  estimation  of  item  response  curves. 
Mathematical  Sciences  Technical  Report  No.  143,  Department  of  Statistics, 
University  of  Missouri,  Columbia,  MO 


17 


8$ 

!*!fe 

o 

& 

r 


h 

1* 


3 


■•M 

V,il 

**.»*» 

w; 

!*>!h 


& 


Tsutakawa,  R.K.  &  Soltys,  M.J-  (1988).  Approximations  for  Bayesian  ability  estimation. 

Journal  of  Educational  Statistics ,  13,  in  press. 

Wingersky,  M.S.,  Barton,  M.A.  &  Lord,  F.M.  (1982).  LOGIST  user's  guide.  Princeton, 
NJ:  Educational  Testing  Service. 


TABLE  1 


Summary  of  Posterior  Distribution  of  Item  Parameters 


Item 

Posterior  Mean 

Posterior  SD 

Posterior  Correlation 

Item  Score 

be  d 

bed 

be  bd  cd 

FIGUF 

Maximum  Likelihood  vs 


FIGURE  3 

Empirical  Bayes  vs  Bayes  Estimates 


FIGURE  4 

Interval  Estimates  of  dunder  Maximum  Likelihood  vs.  Bayes  for  50  Examinees 


FIGURE 

Interval  Estimates  of  9  under  Empirical 


FIGURE  6 

Interval  Estimates  of  Percentile  under  ML  vs  Bayes  for  50  Examinees 


FIGURE  7 

Interval  Estimates  of  Percentile  under  Empirical  Bayes  vs  Bayes  for  50  Examinees 
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